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The bilingual situation in Illinois is described 
brief lyr and an outline of the instructional objectives of local 
bilingual programs ia given^ The* programs are to be: (1) measurable 
and oriented toward the end-o:^year-proddctp and (2) organized, within 
t\ie guidelines for ' state-funded bilingual programs* The main part of 
ttie report describes the design of the procedures set up to evaluafe 
these programs based on the following recommendations from the Office 
of the Superintendent of Public Instructions (1) prior to 
implementing a bilingual program in \a community a sociolinguistic 
survey should be conducted there; (2) priority should be given to 
early childhood programs, preferably pre-Sfchool and kdi^derg^rten; (3) 
•standardized* instruments, ^rather than criterion-ref erenced^est s 
should be selected as measurement tools; and, (4) ^ini^far as possible, 
a true experimental evaluation^ de.sigri should' be employed, with 
randomly assignyed treatment an^ control groups. The aim was. to select 
*and implement th^ combination of designs and instruments which would 
most effectively give an accurate picture of local bilingua! 
education programs. Actual evaluation findings are not reported here, 
^Anticipated design refinements toT future years are mentioned, and 
three tables givd: (1) a description >of thfe measuring instruments; 
(2) statewide evaluation designs\and project sites, al^d (3) 
between-groups hypothesis, (TL) \ 
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Unlike most other states with large non-English-speaking populations. 



most Illinois bilingual programs are funded from state revenues. In the 
short span of three years, state funds for bilingual education have increased 
dramatically from $200,000 to $2,370,000. ^At this writing (February,1l973) , 
forty-nine bilingual programs are state funded, nine are federally funded 
(ESEA Title VII), and one is funded by the Chicago Board of Education. (The 
city of Chicago also contributes to some of the other bilingual programs above 
the city-wide per capita expenditure level.) Twenty-eight of the fifty-nine 
bilingual programs are outside the city of Chicago. Hosi? of these 'Mownstate'* 
pro<>rams fall within the wide geographic band which stretches wast to Tloline on 
the Iowa border, north to Wiukegan and Rockford ne,ar the Wisconsin border, and ^ 
south to Joliet. A fow proprams eo as far south as Danville and Areola. 



NOTeT Since this paper was vrritten, the Illinois General Assembly appropriated 
$6,000,000 for bilingual programs^ FY-74. This additional revenue .allowed 
the number of Chicago projects to increase to 57, and the dovmstate projects 
to 35. The number of children served in bilingual programs jumped from 
^5,000 to 16, 000,' 
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Between the two-thirds and three-fourth of the children who need a 
bilingual program live in Chicago. Headcounts have identified ,65,00(X 
of these children in the Chicago Spanish-speaking cotomunity alone. Schools 
need help as they try to meet the special educational needs of children 
who, because they understand another language and have learned the values 
of another culture, will not approach their own potential for learning in 
our traditional English-language curriculum. Of the estimated 100,000 
Illinois children from non-English-speaking backgrounds', Idss than six 
percent are currently enrolled in a bilingual program. 

The instructional objectives of bilingual, programs are developed by 
,each project to suit their local needp. This is accomplished x^ithin the 
parameters of two constraints; the objectives are to be measurable, end-of~ 
year product oriented, and they are to be organized under the appropriate 
goal described in the state guidelines for all bilingual programs seeking 
state reimbursement. There are seven of these goals. 

(1) Children in the bilingual program achieve fluency and 
literacy in two languag;es. 

(2) Children in the bilingual program will achieve at a rate commen- 
surate with their own age, ability, and grade level in all school 
subject areas. - , . 

(3) Children in the bilingual program will demonstrate growth in self- 
esteem. 

(4) Children In the bilingual program will be provfded with a coordin- 
ated and intej;rated learning environment through effective coordina- 
tion with the regular school program. 

' (5) All teachers and staff members of participating schools will be 
involved in a comprehe;iisiye inservice training program. 

3 



' (6) Parents and other community members will be involved 

in the planning, implementation, and evaluation of the 
. bilingual program, 
(7) Each bilingual project will implement an evaluation to 
assess its effectiveness* 

Much of the negative findings reported by recent studies of compen- 
satory educational programs and experiments iu performance contracting 
(e.g., Garfinkel, 1972) has been criticized as chronologically premature 
and analytically faulty (Campbell and Erlebacher, 1970; Campbell and Frey, 
1970; O'Connor aud Klein, 1972). The critics underscore the need for 
alternate procedures in data analysis and interpretation. Wrightstone 
( M«^0 and Fitzgibbon ( !»•</.) outline a number of cautions and suggest 
preferable procedures- to be employed in measurement tasks, especially In 
thiB use of standardized tests for the purposes of evaluating reform pro- 
grams^ All these studies claim that fair chance has not beten afforded 
compensatory and performance contracting programs. Evaluation for account- 
ability must be improved through a more appropriate use of standardized 
or non-standardized instruments, better experimental designs, and more 
appropriate procedures for data analysis. 

A unique evaluation design has been deployed in Illinois' bilin<>ual 
education programs* The major thrust of this design, as the title in- 
dicates, is in instrument assessment and in varying the quasi-experimental 
designs. In addition to a discussion of these two areas, this report 
will touch on a number of factors involved in developing the evaluation 
design. 



. The importance of evaluating Silincrual programs has been given 
very high priority. Even before the Illinois legislature passed the 
bills which would appropriate funds for' bilingual education (the eov- 
ernor subsequently signed them into law in Septenber.of 1971), acknowl- 
edged authorities jLn evaluation design were consulted by the newly formed 
Bilingual Education Section of the Office of the Superintendent of Public 
Instruction/ Among those experts who gav^ of their tithe were: Donald 
T. Campbell, Thomas Cook, Philip Brickmaift, and Lee Secrest— all from the 
social psychology department of NorthwesVern University; Marilynn B. 
Brewer from the psychology department^of . Dijipi^ University; G. Richard 
Tucker and Wallace Lambert, psycholinguistista from ricGill University-, 

and Robert Cooper, a linguist from Stanford University. 

It 

Four general recommendations emerged from these consultations: 
First, that prior to implementing a bilingual program in a com- 
munity a sociolinguistic survey be conducted there; 
second, that priority be given to early childhood prosirams, 
preferably pre-school and kindergarten^ 

third, that ''standardized" instruments, rather than cr it et ion- 
referenced tests, be selected as measurement tools* 
fourth, that insofar as possible, a true, experimental evaluation 
design be employed, with randomly assigned treatment and control 
groups, 
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This .paper will discuss what was planned for the state-funded 
bilingual programs in each of the,se four area©, with most of the 
discussion centering on the areas of instrumentation and design. 
Evaluation finding^ are not reported in this paper., 

The evaluation plans described here were developed principally 
in the five months in 1971 v/hich preceded implementation of the bilingual,, 
programs; the design has been '^tuned up" periodically since then. The 
evaluation design developed during this period w^s to be deployed for the 
first two years of the programs' existence, fiscal years 1972-73. The 
emphaSis is heavily on a method to ascertain whether cognitive achievement 
is ^^nhanced by attending a bilingual program. The important area of 
affective growth will be deferred to a later period of inquiry due to the ^ 
scarcity of adequate attitudinal measures appropriate for Illinois "bilingual" 
children and to the pressing need to determine how academic achievement was 
affected by the program. (While supporters of bilingual programs were decidedly 
interested in hov7* self-esteem is affected by the pro[?ram, those who were 

erving their support were much more concerned about cognitive developments.) 

Sociolin^uistic Surveys . 

A sociolinguistic survey was not conducted prior to implementation 
of bilingual programs. Both advantages and drawbacks of such surveys were dis- 
cussed. The advantages of conducting a sociolinguistic survey among the target 
communities were: (1) It could provide a means of data collection on variables 
whose description were important to the evaluation design; (2) it could'^pro- 
vide information relevant to determining program content: and (3) it could 
provide both vehicle for informinp; the bilingual community of the possi- 
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bllltles- of initiating a bilingual program and means to gain community 
support of the program. 

The drax^backs of conducting a sociolinguistic survey included the 
following: (1) Growing resentment in Spanish- speaking communities to 
information-gathering surveys^ (2) modest expectations concerning the 
prospeot of learning something unexpected through the surve^r due to the 
likelihood that an Illinois, survey would replicate antecedent surveys; 
(3) the timeline imposed upon the state office by circumstance would not 
allow time to initiate any fundamental program changes which might be sug- 
gest;ed by any anticipated survey findings. 

^ Alternate ways to ^achieve the results looked ^or in a sociolinguistic 
survey were then proposed. Collection of demographic data would be effected 
with' the assistance of local teachers and administrators after the program 
got ^on its feet. Bilingual balance and language domain information would 
be gathered through student questionnair'es and recordings of student speech 
samples. Local communities would be informed through letters from schools, 

•v 

visits by bilingual teachers and aides, newspaper stories^ and involvement 
in local bilingual advisory bodies. Program chalice would occur whenever input 
seemed to warrant it. (An assessment of the success of these alternate 
techniques will be made in a sub^^uent report.) 

Early Childhood Priority . 

There was general agreement both among the state staff, the state ad- 
visory council, and outside consultants, that in all probability both short 
term and long range effectiveness of bilingual programs would be j>r eater on 
younger children. The idea was to begin a program before the all-too-common 
deleterious effects of regular programs take their toll. Research (Hunt, 1961 
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Bloom, 1964; Karnes, Hodglns, Teska, 1969) has clearly demonstrated the 
early years as the most educationally formidabj^e ones. In the area of 
foreign languages especially, elementary school programs have repeatedly 
shown this to be sound. /It is at this level of education that parental 
interest in their children's educati^onal development is at its roost in- 
tense. • Opportunities to study incremental, or follow up, effects of 
bilingual education are, of course, greatly enhanced by beginning programs 
early. 

On the other hand, Illinois does not have a tradition of public pre- 
schools. Mandatory at^tendance ^begins with first grade, and up to the 
year 1970, local school districts were not required to provide kindergarten 
experienfce for children of parents Uho desired it. 

• ^ . " ' / 

It was decided to* concentrate most of the resources available in 

FY-72 on the K-3 level. (two secondary projects were funded in Chicago.) 
In FY-73, a number of preschool bilingual projects were funded, and most 
existing programs were extended jto K-6. (One additional secondary pro- 
gram was funded in Chicago, and one dropout prevention program was funded 
dovmstate.) 

Having decided, largely because of the time factor, not to attempt a 

♦ 

sociolinguistic survey of selected Spanish-speaking communities, and after 
having set priorities for funding at the primary level, our attention focused 
on the problem of what instruments to select to measure cognitive growth 
of "bilingual" children. 

Selection of Instruments. 
f 

Input variables .'^ One selects instruments to test ^ specific population. 




The population to be tested in this case consists of Illinois children 
of Spanish-speaking background. Yet an educational program that wotks well 
for a Cuban youngster may not be equally effective with Chicano children. 
The program may be more effective with children of fOne age than another. 
Achievement of the product oriented goals listed earlier are dependent on 
the initial (i.e. pretest) language ability in both English and. Spanish. 
Eight different variables vjhich help describe the student are identified in 
this design as input variable^: 

Pre-school through 6th grade. 
Male and female 
1^ through^ 22 

Biling\ial, TESL and TERC (Teaching 
English in' Regular Classroom). 
Mexican, Puerto Rican, Cuban, U. S. Latin 

Other Latin, and Anglo. 
Port of entry, l/4th of student life, 
1/2 of student life,-3/4th of student 
life and all of student life. 
(7) English language 3-point scale on teacher rating;, Bnd 

proficiency 10-point scale on self ratine. 



(1) Grade 

(2) Sex 

(3) District 

(4) Treatment 

(5) Ethnicity 

(.6) Residency in 
U. S. 



(8) Spanish language 
proficiency 



3~point scale on teacher ratine; , and 
10-point scale on self rating. 



Outcome varial;iles . In spite of the current vogue for criterion- 
referenced tests, the lack of agreement over vjhat a student should be able 
to do after a given amount of exposure to a bilingual program made it im- 
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practical to base a statewide evaluation' on widely disparate, and often non- 
existent, teacher-made or criterion-referenced tests • .The general areas 
to be t:ested are Identified In this design as outcome variables. 

The three product oriented goals of the Illinois bilingual education 
pro^'.rams are_g^oals 1 through 3 listed on page two of this report. Pre to 
post changes In the following output variables will be evaluated. 



(1) Pre-school grades: 

(2) Grades K and 1: 
(3) 
(4) 

u 

(5) • • 

(6) , " 

(7) Grades 2 through 6: 
(8) 



(9) 

(10) 
(11) 
(12) 
(13) 
(14) 



Grades 2 throuph 4: 
Grades i through 6: 



Position in the development scale 
(i.e.^ year of implementation). 

f 

Basic concepts In Spanish language. 
Basic concepts in English language^ 
Basic concepts in JIathematics, 
measured in Spanish. 
B^sic concepts in Mathematics, 
measured in English. 
Self-concept. 
English language reading. 
"Spanish lan«y,uage reading. 
Mathematics, measured bllingually. 
Self-concept.' ^ 
Self-concept . 
Attitude. 
Study habits. 
Level of aspiration. 



Since achievement in the bilingual program is' to some extent a function 
of pretest standing and general intelligence, verbal and non-verbal intelli- 
gence -at pretest time (only FY-72)-, and pretest scores on denendent variables 
are considered convariates for the evaluation. \ 
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It s6emed.^neconomical to consider development of new norm-referenced 
Instruments until an adequate assessment of existing instruments was cpm^ 
pleted. Samples were requested of every standardized test whose use was 
reported by a bilingual project anywhere' in the U.S. (Plakos, 1971) • Tests 
were also identified through the reviews in ,the Mental Measurement yearbooks 
(Buros, 1965, 1972) and the UCLA Center for the Study of Evaluation hand- 
books (1970, 1971). These 'instruments were classified according to what 
they purportedly measured sjnd their appropriateness for children on the 
elementary school level. Each instrument which promised to measure something 
relevant to the envisioned bllinf»ual programs was studied, item by item, by a 
tegm of bilingual-bicultural psychologists. (Rafaela Ellzondo Weffer, and 
Ana Belkind did most of this.) ' 

A list of the instruments which were selected for use in most of the 
state programs operating on the elementary ^evel is given, in Table I. 

It is immediately obvious that a- test instrument which assumes fluency 
in a languag^ which is not -understood by the testee invites gross misrepre- 
sentation of the te^tee's cognitive - skills in areas other than lanf^ua^^.e. Too, 
the cultural—and o|ften linguistic — inadequacy of translated tests is widely 
appreciated. Then again, since no standardized instrument has been norcied 
on Illinois' multi-ethnic children of Spanish-language background, how would 

test scores be interpreted? 

t * ' ■ 

This sticky lanr^uage problem is greatly compounded by the broad 
continuum of fluency in both English and Spanis||.'over which Illinois' 
"bilingual" children are spread. For every conceivable point on the cTon- 
tinuum there is some child in Illinois whose relative English/Spanish fluency- 
would place him there. 
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The general solution to these problems was, suggested by Rafaela 
Ellzondp de Weff er and consists of alternating the language for every 



other item. on a number of the tests • This technique has the potential ■ 
of (a) reducing test anxiety and frustrations due to weakness In one of 
|he ;two l^anguages, (b) redif^||pg time vieeded for testing, (c) reducing 
testirlg cost, (d) providing* data on the relative dominance of each 
languS^ge, as well as data on the test's content. This technique also 
requl^res bilingual test administrators, thus avoiding difficulties in 
communication between tester and tested. Appropriate 'checks to evaluate 
the effectiveness of this alternate language t^hnique will be applied. 



The hypotheses developed to probe the strengths and weaknesses of 
the selected instruments include the following :^ , ' 



(1) The standarized tests selected for the battery are appropriate 

^ for measuring the outcomes of bilingual programs, (Appropriate- 

, ■ ' 

ness is considered tn terms of i%^m analysis, effect of random 
\ . * " 

response^on sco'ra^ cultural loading, and set response patterns,) 

(2) ^0ral examinations are superior to written examinations in ellrt:iting 

maximum performance in bilingual populations, 
(3}, Appropriate coding of/ circles drawn to represent self in different 
situations constitutes a valid measure of the relative self-esteem 



of bilingual students in the respective situations, 

(4) Data from the Dailey Language Facility Test can be validly inter- 
preted for dfegree of feilingual balance and personality character- 
istics as well as for language facility, 

(5) In grades '2 arid 3, test performance is more related to language 
proficiency than to grade level, contrar/ to^ the cXafesical 



construct that as grade level increases proficiency (i.e. test per- 
formance) also increases. 

(6) Non-verbal tests are more appropriate than verbal tests to measure 
the general ability of bilingual children. i 

(7) Alternating items between two languages within the same test is a 
more effective procedure to administer tests to bilingual student 
populations than the single language procedure. 

(8) Alternating items between two languages v/ithin the same test does 
not affect the reliability of the test. 

(9) The sequence of the two languages in testing bilingual populations 
by the alternate language testing ^procedure does not affect the per- 
formance in either language. 

(10) Scores on the numerical ability subtest of the Inter-American General 
Ability Test is a valid index of the mathematics achievement of bi- 
lingual students. ' 

The testing periods were set for January, 1972, May, 1972, October, 1972 
January, 1973 (for downstate only), and May, 1973. The test-taking time for 
each student per testing period averages two and one half hours. This is 
generally split between two days to avoid fatigue. Testing is administered 
by bilingual-bicultural testers who have been inserviced in the techniques 
to be used with the Instruments. (The, initial testing period — January, 1972 
was a:ccomplished some six weeks after * commencement of the bilingual programs 
An important function of .this delay wa$ to reduce testee anxiety.) 

(Because of this time-series desij^n, a report of rpr^gram effects would 
suffer a two-year delay. To .get ail advance indication of how the plregi:am 



was ^olng, a preliminary evaluaticxn report was presented. This report was \ 
basdd^on a study of the test data of fltst graders from ele^^eti downptate 

programs.' See Weffer, 1972.) . ' 

• • • " * » . ■ . * 

Before test data from these instruments can be interpreted In terms 

of the achievement of Illinois children of Hispanic backf>round, th^ xeliablAity « 

of the instruidents must be determined. To assess reliability, KR-',20 and ^plit 

half techniques are being applied* to each of the instruments and their sub- 

tests, and correlattpns dfetexmined for all instruments and subtests.- Data- 

from the first, testing period jig being* used for t'his purpose. The more numerous 

•te«t data of the third testing period will be used to replicate the initial 



fdlndings. '(Firs.t testing period data will be based exclusively on downstate 

scores, while the third period data will include both Chicago and downstate 

scores.) Finally, norms based on thp performance of Illinois children of 

Hispanic background will be established with the data from the third testing 
« 

period. I 

Test reliability answers the question of how dependable are the test 
scores. That is, hov; much fluctuation can be expected in a given instrument. 
But high test reliability does not necessarily indicate that the test is 
testing what the testers want it to. This is a question of test validity. 

■ I 

Whether in fact the selected instruments measure content and skills 

/ ' ■ ^ 

which are central to the objectives of bilingual program as actually 
implemented needs to be demonstrated. Indices of the validity of these 
instruments will be attempted in several ways. Test scores will be 
cortelated with teacher grades; the purported test objectives will be 
assessed by teachers via questionnaires as to their relevancy; a committee 
of teachers will evaluate the tests on the basis of an examination of the 
cultural and/or linguistic biases of the test items. 

U 



Evaluation Des^lgns , ^ 

' •Programs are evaluated so changes can be made which will enhance 
t. ■ ■ ■ • ^ 

'•. . - ' • 

their effectiveness, -Since there. is widespread interest in the worth of 

■ > ■ ■ 't'- 

bilingual edilcation, an evaluation design was sought which would permit 

broad generalizations as to treatment eff ect» The fundamental policy 

questions to be answered were: (1) Can achievement of children of Hispanic 

background be adequately ^measured by existing standardized instruments? (The 

previous discussion of instrumentation deals with this point.); and (2)^Do _ . , 

children in bilingual programs learn as much or more in the routine school 

subjects than they would have had they stayed in the regular school program? 

In addition, baseline data needs to be collected on v/hether the effects of a 

bilingual program are most noticeable durin<> the first year or so of a child's 

participation, or whether the effects are Incremental and whether there is a 

critical pdint for be<>inning bilinp;ual education. 

There are two major approaches to controlling for artifacts which lead 
to a distorted view of bilingual program effects* One approach employs comple: 
statistical techniques, such as path analysis. This technique, pioneered by 
Otis Dudley Duncan, is exemplified in the recent study by Christopher Jencks, 
et al. Inequality; A Reassessment of the effect of Family^^a Schooling in 
America (1972) . 

The other approach Is the treatment-comparison group technique* In its 
simplest form, equivalent subjects in experimental and control conditions are 
pre and post tested. The differences would then become the critical points 
of illumination. The best contemporary exposition of this technique was 
done by Campbell and Stanley (1963). . . 

If) 
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The single most potent way to increase the interpretability of a 
comparison-group desigiv is to assign subjects randomly to treatment 
(bilingual program) and control (regular school program) conditions. 
Random assignment makes a "true" experimental design possible, whereas 
the same design with "comparable" but not randomly assigned control 
grcKi^s^ Campbell calls a "quasi-experimental" design. The results from 
true experimental designs are, of course, much easier to unequivocally 
interpret than are quasi-experimental designs. The relative strength of 
a quasi-experimental design depends largely on how initially eqiiivalent 
the treatment and comparison groups are. (The other criterion for judging 
the strength of a quasi-experimental design is the number of controlled 
threats to internal and external validity.) 

V 

We decided to aim for a true experimental design, a la Campbell and 
Stanley, insofar as possible. Where random ass igniaent was not feasible, 
the identification of similar but not equivalent comparison groups was 
attempted. Since reliability and external validity are evihanfced by a large 
sample representing schools with differing characteristics, all state-funded 
bilingual programs throughout the? state were to be included /in the overall 
design. (A detailed description of the strategies employed; to reduce the 
threats to both internal and external validity for each de^ij^n, and a 
discussion of a unique aspect of design manipulation, is l^eing prepared as 
a separate report.) 



The designs as they were planned and implemented — what was implemented 
was not alv/ays what was planned — for each of the bilingual projects which 
were funded in FY-72 and/or FY-73 are presented in Table II. 

JG 
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Rationale for multiple designs » There are three main rveasons to employ 
multiple overlapping designs. First, local conditions differ widely and 
a design feasible In one school may not be physically possible or politically 
desirable In another school setting. For example. In one school all 
eligible students may be enrolled In the program, vrhere In another, only 
a fraction may be so enrolled. Second, the evaluator can never be certain 
In field settings that what begins as a true experiment will end up that 
way. Because so many field exigencies work to erode or subvert carefully 
controlled experimental Conditions, one has to be prepared with alternate 
quasi-experimental designs. Third, while no quasi-experimental design 
adequately controls for each of the nine threats to internal "v^alidity 
and the three treats to external validity (see Campbell and Stanley, 1963), 
by overlapping the design the potential to minimize the strength of rival 
explanations of the data is increased. A subsequent report will discuss this 
in much greater detail. 

Random assignment . When the degree of relative need is not considered 
an especially relevant criterion of inclusion in the program (due perhaps ' 
to an especially large sairfple size) , students can be randomly selected from 
a list of subjects which is approximately twice the size which can be accom- 
modated ultimately in the bilingual program. 

The obvious disadvantage of this in schools without twice the number 
of very needy students that the program c^an handle is that many students 
who badly need the program will lose their place tp others of more marginal 
need. Schools have not reacted enthusiastically to randomly selected treat- 
ment-control groups and this model was abandoned after an' abortive try. 

\ 



An additional objection to having a randomly selected> control 
group within a school is that the students selected by schools for in- 
elusion in bilingual education programs are generally the most needy, who, 
because of this, cannot be compared to a group which has less need for the 
program' when the purposfe of the comparison is to demonstrate the relative 
efficacy of the treattdent. 

Random within stratum . For #Y-73, a compr<imise true experimental de- 
sign was proposed for eight Chicago schools and two downs t ate schools. 
Vxhis design was suggested by Donald T. Campbell.) These schools were 
^sked to categorize their students of Hispanic background who might poten- 
tially benefit- jfrom enrollment in a bilingual program into three categories 
the most needy, the second most needy, and lastly, students who would pre- 
sumably profit from a bilingual program but for whom there is no present 
hope of being included, given the limited available resources. Criteria 
for determining need was left to each school to determine. 

/ . s , . 

A typical design of this type in a school which could handle about 
150 students in their bilingual program might list 50 children in the 
first most-needy category, 20 in the next-iiiost-needy category, and perhaps 
500 in the least-needy category. The true experiment occurs within the 
second category. Here, about half of the students are randomly selected 
for thfe bilingual program. Their progress is compared to that of the other 
half of the same category who continue in the regular school curriculum. It 
will be noted that' external validity is made more problematic by this design 
since the extremes at both ends of the need continuum have been omitted. 

Parallel schools/classes . Comparisons are being attempted where pro- 
gram echdols or classes can be matched on a number of socioeconomical 
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variables with neajcby non-program s.chools or classes. There are three 
downstate districts vzith bilin^;;ual programs in some but not all of the 
eligible schools. In Chicago, one non-program school has been identified 

: ■ ■ \ 

through matching, and two schools have identified parallel classes within 

the program builjjings. 

i ■ • \ 

» 

Re<>resjgion-discontinuity . This design takes advantage of .situations 

where a sharp arbitrary cutoff of subjects who are eligible f or ^;he bilingual 

\ 

program becomes necessary. One such cutoff point was the result of policy 
decision to limit rdost programs during FY-72 to gtades K-3. A second cutoff ' 
point is feasible where 'a school ranks each student in a given ^rade according 
to need for the program, then selects the cutoff point which separates program 
from non-program children. In the few instances where this type of cutoff 
was implemented, schools v/ere asked to priority rank twice the number pf 
students that the program could accomodate. Five or ten numbers on each side 
of the "optimum"^ cutoff point were then identified, and the cutoff was deter- 
mined randomly within this band. 

The regression-discontinuity design consists mainly in (X) obtainin<r 
test data on experimental subjects by gVade level, (2) obtaining test data 
.on subjects in adjacent grade levels whiph are without bilingual programs, 
(3) extrapolating the scoring trend of tl^e grade levels experiencing bilingual 
programs to non-program levels, and (4) comparing the obtained trend for 
non-program grade levels with the trend obtained through extrapolation. 

Grade-cohort > This design takes advantage of M:he fact that the test 
data of adjacent f^rade levels overlap without any systematic bias, 
provided the school has not previously maintained the experimental program. 
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A fourth grade student at the end of the academic year is expected to be 
at the fifth grade level as far as his academic achievemeht is concerned. 
As a corollary to this statement, a fifth grade student at the beginning 
of the year could be considered to be at the fourth s^^ade level as far as 
academic achievement is concerned. Therefore, the pretest scores of the 
fifth graders can be compared to the posttest scores of the fourth graders. 

A 

The same^ logic can be applied to the other grade levels. This inethod of 
comparison is feasible for most programs initiated in both FY- 7 2 and FY-73. 

Stratified student populatipn . In this design, different populations are 
compared for their contrastive interest. Native speakers of English and 
native speakers of Spanish, Latit^s in a bilingual program and Latins not in 
a bilingual program, Anglos in a bilingual program and /nglos not in a bi- 
lingual program, are the contrastive categories employed in this desif^n. 

Between-groups hypotheses . 

In addition to instrumentation hypotheses which have already been pre- 
seated, three other types of hy\>otheses have been developed as part of this 
general evaluation design — within-program hypotheses, be tween-gr mips hypotheses , 
and hypotheses concerning validity threats which are affected by manipulating 
overlapping design. These latter hypotheses ^<f±ll be reported later when the 
multiple designs approach is explicated. / 

The between-groups hypotheses form the major probe area aloncr with 
the instrumentation hypotheses, of the first 16 months of this design. The 
purpose of these between-groups hypotheses is to focus clearly on how children 
in bilingual programs achieve when compared to similar children who are in the 
regular school curriculum. These hypotheses *are graphically presented in 
Table III. 20 
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Wlthln-prograTg hypotheses , ^ , 

After probing the question of whether students learn more in a bilingual 
program than they would have had they stayed in the regular school program, 
there is another question to ask: How much mathematics, science, social 
studies, language arts did th^y learn in the experimental program? 

Tjie best way to ^et answers to these questions is through criterion- 
referenced tests. Unfortunately, as we have already noted, these instruments 
areonol: currently available ij'n a form suitable for bilingual prograns, Jn 
an effort to press the selected norm-referenced instru^nents (see Table I) into 
double service, a number of Ijypotheses were developed which attempt to exploit 
whatever potential these instruments hold for me^surlnf» concept mastery, A 
list of the^e hypotheses follows: 

(1) Eighty percent of the students in grades K and A, at the end 
of each year ^Till show a mastery of 30 percent of 'i:he concepts 
tested through one or more of the^ following instruments, 

^ a, BOE^ni test of Basic Concepts in Enj>lish (grades K-1). 

b. BOEHII test of Basic Concepts in Spanish (parades K-1). 

c. Test of Basic Experiences in English Languaf^e (grades K-1), 

d. Test of Basic Experiences in Spanish Lan[^uage (grades K-1), 

e. Test pf Basic Experiences in i^Iathematics , tested through 
Spanish (grades K-1). 

f. Test of Basic Experiences in llathematics , tested through 
English (grades K-1), 

(2) ^ Assuming that a composite score on bilingually administered 

Test of Basic Experiences is a measure of bilingualism, 80 percent 
of the students in grades K and 1, at the end of the year, will 
show a mastery of 80 percent of the concepts tested through the 

21 
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Instrument. (The assumption about the composite spore will be tested 
through appropriate analyses of correlations among a, b, c, and .d above.) 

(3) Assuming that a composite score on th^e tx7o forms, form A - Spanish and form 
B - English, of the BOEWl test of Basic Concepts is a measure of bilingual- 
ism, 80 percent of the students in grades K and 1, at ^he end of the year 
will show ^ .mastery of 80 percent of the concepts measured by the two in^ 
struments. (The assumption about the composite score will be tested throu<;h 
appropriate analyses of correlations among a, b, c, and d above.) 

(4) A statistically significant change beyond normal growth rates in the pre 
to post perf ori!>ance^ of the students in grades K and 1 will be evidenced 
after five to nine months participation in the bilingual program, as measured 
by the scores on each of the following measures! 

a. BOEHII test of Basic Concepts - English 

b. BOEHil test of Basic Concepts - Spanish 

c. Test of Basic Experiences - English language 

d. Test of Basic Experiences - Spanish Language 

e. Test of Basic Experiences - Mathematics, tested throtfgh ^ 
Enj^lish. 

f . Test of Basic Experiences - Mathematics, tested through 
Spanish. " 

(5) Participating students in grades 2 through 6 when posttested through 
appropriate levels of the tests, will show one month *s growth from pre- 
test status fpr every month of participation in the program, as measured 
on each of the following tests: 

a. English Reading (Interamerican Series) 

b. Spanish Reading (Interamerican Series: Lectura) 
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(6) At the end of the year, 80 percent of the students in grades 2 
through 6, will show a mastery of 80 percent of the concepts 
tested through appropriate levels of the TOBE and BESC Math Test 
Bathematic test 

(7) Change in the performance from begitining of the year to end of 

\ ^ 
year of those students who at pretest rank in the lower quartile 

on Self /Concept /Affective Factors test will be statistic<iilly sig- 
nificant at the .05 level after scores are corrected for measured 
regression. , , 

Process evaluations . The whole thrust of the evaluation design 
described in this report is product oriented, with its concern for measured 
.cognitive achievement among Spanish-speaking children in elementary school. 
Yet an evaluation of the teaching process involved in helping children achieve 
is clearly relevant to an understanding of the effectiveness of a bilingual 
program. • . - 

• Two process evaluations • are in operation, one is a teac$^er self- , 
assessment narrative done peric|dically to evaluate the effectiveness of his 
teaching strategies in meeting each of the seven state goals of bilinj^ual 
education. The second process evaluatlcHv ^s a ccomplished through onsite 
visitations by teams ^f^b servers. Both of these process evaluations will 
be described at greater length and assessed in a subsequent report. 

Anticipating design refinements for FY-74 . The evaluation desipin ^ 
described in this report is envisioned as a developmental method to obtain 
data on questions whose focus is being continually sharpened. We already 
perceive a need to incorporate a greater variety of evaluative instruments 

23 



Into next year's design: affective measures^ new or different standard- 
ized tests, criterion-referenced instruments, diagnostic measures, and. 
instruments appropriate for the secondary school level. Due to the heavy 
reliance on test instruments, unobtrusive techniques need to be developed, 
We anticipate short-term experiments within bilingual ptograms to gau^e the 
effect of various program subcomponents. 

The plans for assessing the effect on the data of instrumentation 
and design variation are being implemented. A later paper will assess 
the rollfk played by these two procedures in increasing accountability. The 

1 

question is not which design or what instrument is best for assessing bi- 
lingual education programs, but what combination of designs and what com- 
bination of instruments give the most accurate picture. 
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TABLE I - DESCRIPTION OF INSTRUMENTS 



Measuring Instrument 



Language 
of 

Instrument 



Level 



Grade 



1/72 
I 



Test of Basic Experiences-Language 



Test of Basic Experiences-Language 



Eng/Span 



Eng/Span 



Kinder 
1-2 



Test of Basic Experiences-Mathematics 
Test of Basic Experiences-Mathematics 



Eng/Span 



Eng/Span 



Kinder 



1-2 



BOEHM Test of Basic Concepts Form A 



Spanish 



K-2 



BOEHM Test of Basic Concepts Form B 



nter-American - Test of Reading 



nter-American - Test of Reading 



nter-American - Test of Reading 



nter-American - Test of Reading 



nter-American - Prueba de Lectura 



nter-American - Prueba de Lectura 



nter-American - Prueba de Lectura 



nter-American - Prueba de Lectura 



English 



English 



English 



English 



English 



Spanish 



Spanish 



Spanish 



Spanish 



K-2 



2-3 



4-5-6 



7-8 



2-3 



4-5-6 



7-8 



nter-American - General Ability 



nter-American - General Ability 



Eng/Span 



Eng/Span 



2-3 



nter-American - General Ability 



nter-American ^^ General Ability 



Eng/Span 



Eng/Span 



4-5-6 
7-8 



Dailey Lang, Facility Test 



Eng/Span 



K-1 



BESC - Draw-a-Circle Self-Concept 



BESC - Language Usage Questionnaire 



BESC - Demographic Questionnaire 



Chicago Self-Concept Scale 



Eng/Span 



Eng/Span 



Eng/Span 



Eng/Span ■ 



K-3 



K-3 
"i06~ 



K-4 



X 



BESC - Test of Basic Mathematics 



Eng/Span 



2-3 



BESC - Test of Basic Mathematics 



BESC - Test of Basic Mathematics 



Eng/Span 



Eng/Span 



4-6 



J J± 




7 
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TABLE I - DESCRIPTION OF INSTRUMENTS 





Language 
of 

Inctri impnt 
iiiaii uiiicii I 


• 

Level 


Grade 


Testing Period 


V72 

1 
1 


5/72 

11 
11 


9/72 

Ml 
III 


1/73 

f V 


'5/73 . 

V 


■ye 


Ciig/Opdll 


1/ 
i\ 


iNlllUci 


y 


y 

A 


y 

A 


A 


X 

/\ 


loe 


Eno/SDan 

1— i ly/ w|->Qi 1 


L 


1-2 


V 
A 


V 
A 


V 
A 


Y 
A 


V 
A 


nruitics 


Ciig/Opall 


1^ 
i\ 


Kinder 






A 


A 


A 


illailuS 


cng/ opdii 


1 

L. 


1-9 


v 
A 


v 
A 


v/ 
A 


v/ 
A 


v 
A 


\ji III r\ 


opaiiisri 










X 


X 


X 


\Ji HID 


•cngiisn 




9 






A 


A 


V 

A ^ 




1 1 lyi 1 


1 
1 


1 
1 














PnnI icK 
lyi iai 1 


9 




X 


X 


X 


X 


X 




J bnghsn 


o 
o 


4-5-6 


X 


X 


X 


X 


X 




English 


A 

4 


/-o 






X 


. X 


X 


Ira 


Spanish 


i 

1 , 


1 












ira 


•^Spanish 


2 


2-3 


X 


X 


X 


X 


X 


ira 


Spanish 


3 


4-5-6 


X 


X 


X 


X 


X 


ira 


Spanish ^ 


4 


7-8 






X 


X 


X 


' — 


Eng/Span 


i 

I 


1 














Ciig/opdll 


o 

£. 




X 


X 










tng/opan 


O 

O 


4-0-O 


X 


X 










cng/opan 




/-o 






^ 








Fno/Soan 




K-1 


X 


y 








)t 


^na/S D«n 




K-3 - 


X 


X 








naire 


Ena/Span 




K-3 


X 










re 


Eng/Span 




K-6 










X 




Eng/Span 




K-4 








X 




i 


Eng/Span 


1 


2-3 






X 


X 


X 




Eng/Span . 


2 


4-6 




\ 


X 


X 


X 




Eng/Span 


3 


7-8 






X 


X 


X 



TABLE II 

STATEWIDE EVALUATION DESIGNS 
AND PROJECT SITES . 



Type of Comparison 



Random Assignment 



Random within Stratum 



Parallel Schools or Classes 



IV Regression Discontinuity 



A. Program, Nonprogram 
; Grades 



B. Random Cutoff on 
Needs Scale 



V Grade Cohort 



VI Stratified Student Population 




1. 
2. 
3. 
4. 
5. 
6. 
7. 



Bensenville. 

Bowen, Burns, Cooper Upper, Sheridan, and Sullivan. 
Bensenville, 

Agassiz, Bowen, Burns, Cooper Lower, Gary, Komensky, McCormick, Sullivan, and Thorp. 
Elgin, Joliet, Steger, and VVaukegan. 

Agassiz Bowen, Bi^rns, Cooper Primary, Cooper Upper, Lakeview, Nash, Sheridan, Sullivan, and Headley-C 
Joliet (Keith-C, LIricoln, Marsh-C, Marshall-C, and Parks). 
8. Lowell and Sheridan, 

^" ChZgo Moline, Steger, Waukegan, and West 

10. Irving and Nettlehorst. 

ChTago °" E'g'"' Moline, Steger, Waukegan, and West 

12. Agassiz, Bowen, Burns, Cooper Primary, Cooper Upper, Lakeview, Nash, Sheridan, and Sullivan 

13. Areola, Crete-Monee, Danville, Elkgrove, Marengo, Maywood, Palatine, Rockford and Wheeling 

4. Gary, Hamlme Irving, Jungman,- Komensky, Lemoyne, McCormick, Morris, Nettlehorst, Plamandon, and Thorp 

15. Elgin, Joliet, Waukegan, West Chicago, Danville, Elkgrove, Crett-Monee, and Rockford 

16. In program Latins Not in program Latins, In program Anglos, and Not in program Anglos. (Sample from Chicago 
Kublic Schools student population in program area.) 



C = Comparison School. 
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